Scaling Multiagent Markov Decision Processes
نویسنده
چکیده
1. THREE CURSES OF DIMENSIONALITY Markov Decision Processes (MDPs) have proved to be useful and general models of optimal decision-making in uncertain domains. However, approaches to solving MDP’s using reinforcement learning that depend on storing the optimal value function and action models as tables do not scale to large state-spaces. Three computational obstacles prevent the use of standard approaches when dealing with problems with many variables. First, the state space (and time required for convergence) grows exponentially in the number of variables. This makes computing the value function impractical or impossible in terms of both memory and time. Second, the space of possible actions is exponential in the number of agents, so even one-step look-ahead search is computationally expensive. Lastly, exact computation of the expected value of the next state is slow, as the number of possible future states is exponential in the number of variables. These three obstacles are referred to as the three “curses of dimensionality”. Much prior work exists on the topic of scaling reinforcement learning to large state spaces. Many state abstraction and function approximation techniques exist. These techniques are a result of the desire to reduce the number of parameters used to represent the value function, and thus reduce memory requirements and time to converge. In addition to such techniques, methods to incorporate prior knowledge can be successful in speeding up convergence. In [4] I addressed the three curses of dimensionality, providing solutions to each. To solve the problem of exploding state space, I introduced a kind of function approximation called “tabular linear functions”. To solve action space explosion, I used a hill climbing technique over the action search space. To solve the problem of computing the expected value of the next state, I introduced ASHlearning, which is a model-based average reward algorithm that uses afterstates to reduce the number of future states it is necessary to examine.
منابع مشابه
High level coordination of agents based on multiagent Markov decision processes with roles
We present an approach for coordinating the actions of a team of real world autonomous agents on a high level. The method extends the framework of multiagent Markov decision processes with the notion of roles, a flexible and natural way to give each member of the team a clear description of its task. A role in our framework defines the set of actions and the policy of an agent. Roles are a natu...
متن کاملExploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)
Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem s...
متن کاملExploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs
Many solution methods for Markov Decision Processes (MDPs) exploit structure in the problem and are based on value function factorization. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, restricting problem sizes and types that can be handled. We present an approach to mitigate this limitation for ...
متن کاملPlanning, Learning and Coordination in Multiagent Decision Processes
There has been a growing interest in AI in the design of multiagent systems, especially in multiagent cooperative planning. In this paper, we investigate the extent to which methods from single-agent planning and learning can be applied in multiagent settings. We survey a number of different techniques from decision-theoretic planning and reinforcement learning and describe a number of interest...
متن کاملSymmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning
This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric opt...
متن کاملMultiagent Expedition with Graphical Models
We investigate a class of multiagent planning problems termed multiagent expedition, where agents move around an open, unknown, partially observable, stochastic, and physical environment, in pursuit of multiple and alternative goals of different utility. Optimal planning in multiagent expedition is highly intractable.We introduce the notion of conditional optimality, decompose the task into a s...
متن کامل